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Abstract 

In a remarkable paper published in 1976, Burnashev determined the reliability function of variable-length block codes 
over discrete memoryless channels with feedback. Subsequently, an alternative achievability proof was obtained 
by Yamamoto and Itoh via a particularly simple and instructive scheme. Their idea is to alternate between a 
communication and a confirmation phase until the receiver detects the codeword used by the sender to acknowledge 
that the message is correct. We provide a converse that parallels the Yamamoto-Itoh achievability construction. 
Besides being simpler than the original, the proposed converse suggests that a communication and a confirmation 
phase are implicit in any scheme for which the probability of error decreases with the largest possible exponent. The 
proposed converse also makes it intuitively clear why the terms that appear in Burnashev's exponent are necessary. 

Index Terms 

Burnashev's error exponent, discrete memoryless channels (DMCs), feedback, variable-length communication 



I. Introduction 

It is well known (see e.g. [1] and [2]), that the capacity of a discrete memoryless channel (DMC) is not increased 
by feedback^ Nevertheless, feedback can help in at least two ways: for a fixed target error probability, feedback 
can be used to reduce the sender/receiver complexity and/or to reduce the expected decoding delay. An example 
is the binary erasure channel, where feedback makes it possible to implement a communication strategy that is 
extremely simple and also minimizes the delay. The strategy is simply to send each information bit repeatedly until 
it is received unerased. This strategy is capacity achieving, results in zero probability of error, and reproduces each 
information bit with the smallest delay among all possible strategies. 

The reliability function — also called the error exponent — is a natural way to quantify the benefit of feedback. 
For block codes on channels without feedback the reliability function is defined as 

E{R) =lim S up-ilnP e (re iJT l,T), (1) 
where P e (M,T) is the smallest possible error probability of length T block codes with M codewords. 



'According to common practice, we say that feedback is available if the encoder may select the current channel input as a function not 
only of the message but also of all past channel outputs. 
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The decoding time T in a communication system with feedback may depend on the channel output sequence^ 
If it does, the decoding time T becomes a random variable and the notions of rate and reliability function need to 
be redefined. Following Burnashev [3], in this case we define the rate as 

(2) 

E [T] 

where M is the size of the message set. Similarly we define the reliability function as 

E f (R)^]xm -UnP eJ {\e R %t), (3) 

where P e j(M,t) is the smallest error probability of a variable-length block code with feedback that transmits one 
of M equiprobable messages by means of t or fewer channel uses on average. As we remark below, the limit exists 
for all rates from zero to capacity. 

Burnashev showed that for a DMC of capacity C, the reliability function Ef(R) equals 

E B (R) = d(l - R/C), 0<R<C, (4) 
where C\ is determined by the two "most distinguishable" channel input symbols as 

C\ = max D(p(-\x)\\p(-\x )), 

x,x' 

where p(-\x) is the probability distribution of the channel output when the input is x, and denotes the 

Kullback-Liebler divergence between two probability distributions. It is remarkable that ((U) determines the reliability 
function exactly for all rates. In contrast, the reliability function without feedback is known exactly only for rates 
above a critical rate. Below the critical rate only upper and lower bounds to the reliability function without feedback 
are known. For a binary symmetric channel the situation is depicted in Fig. Q] 




Fig. 1. Reliability functions for a binary symmetric channel with crossover probability 0.1. Shown is Burnashev's 
reliability function for channels with feedback (solid line) and upper and lower bounds to the reliability function 
for channels without feedback. The upper bound (dotted line) is given by the straight line bound at low rates 
and by the sphere packing bound at higher rates. The lower bound (dot-dashed line) is given by the expurgated 
bound. The upper and lower bounds coincide above the critical rate, Rait. 



Burnashev showed that Ef = Eb by showing that for every communication scheme 

E[T}> _£j(i- o( i)) (5) 

2 If the decoding time is not fixed, in the absence of feedback the sender may not know when the receiver has decoded. This problem 
does not exist if there is feedback. 
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where o(l) represents positive terms that tend to zero as 1/P e tends to infinity, and that there exists schemes with 



where o(l) now represents positive terms that tend to zero as both M and 1/P e tend to infinity. 

For a plausibility argument that justifies © it suffices to summarize the achievability construction by Yamamoto 
and Itoh [4]. Their scheme relies on two distinct transmission phases that we shall call the communication and the 
confirmation phase, respectively. In the communication phase the message is encoded using a fixed-length block 
code and the codeword is transmitted over the forward channel. The decoder makes a tentative decision based on 
the corresponding channel output. The encoder knows the channel output and can run the algorithm used by the 
receiver to determine the tentative decision. If the tentative decision is correct, in the confirmation phase the encoder 
sends ACK. Otherwise it sends NACK. ACKs and NACKs are sent via a fixed-length repetition code. (The code 
consists of two codewords). During the confirmation phase the decoder performs a binary hypothesis test to decide 
if ACK or NACK was transmitted. If ACK is decoded, the tentative decision becomes final and the transmission of 
the current message ends, leaving the system free to restart with a new message. If NACK is decoded, the tentative 
decision is discarded and the two phase scheme restarts with the same message. 

The overhead caused by retransmissions is negligible if the probability of decoding NACK is small. This is the 
case if both the error probability of the communication phase as well as that of the confirmation phase are small. 
Assuming that this is the case, the number of channel uses for the communication phase (including repetitions) is 
slightly above (In M)/C. The probability of error is the probability that NACK is sent and ACK is decoded. In 
the asymptotic regime of interest this probability is dominated by the probability that ACK is decoded given that 
NACK is sent. In a straightforward application of Stein's lemma [5] one immediately sees that we can make this 
probability to be slightly less than P e (thus achieve error probability P e ) by means of a confirmation code of length 
slightly above (— In P e )/Cx. Summing up, we see that we can make the error probability arbitrarily close to P e by 
means of slightly more than (In M)/C — {bxP e )/C\ channel uses on average. This confirms (J6]). 

To obtain the converse (O, Burnashev investigated the entropy of the a posteriori probability distribution over the 
message set. He showed that the average decrease of this entropy due to an additional channel output observation, 
as well as the average decrease of the logarithm of this entropy, are bounded. He uses these bounds to form two 
submartingales, one based on the entropy of the a posteriori distribution and the other based on the logarithm 
of this entropy. He then constructs a single submartingale by patching these two together. Then Doob's optional 
stopping theorem is applied to this submartingale and the desired bound on the expected decoding time, which is 
a stopping time, is obtained. Burnashev's proof is an excellent example of the power of martingales, however both 
the sophistication of the martingale construction and the use of the logarithm of entropy leaves the reader with 
little insight about some of the terms in the converse bound. While it is easy to see that (In M)/C channel uses 
are needed in average, it was not as clear why one needs an additional (— \nP e )/C\ channel uses. The connection 
of the latter term to binary hypothesis testing suggested the existence of an operational justification. The work 
presented in this paper started as an attempt to find this operational justification. 

Our converse somewhat parallels the Yamamoto-Itoh achievability scheme. This suggests that that a commu- 
nication and confirmation phase may be implicit components of any scheme for which the probability of error 
decreases with the largest possible exponent. Our approach has been generalized by Como, Yiiksel and Tatikonda 
in [6] to prove a similar converse for variable-length block codes on Finite State Markov Channels. 



We consider a discrete memoryless channel, with finite input alphabet X, finite output alphabet y, and transition 
probabilities p(y\x). We will denote the channel input and output symbols at time n by X n and Y n , and denote 
the corresponding vectors (X\,X2, . . . ,X n ) and (Yi, Y2, • • • , Yn) by X n and Y n , respectively. A perfect causal 




(6) 



II. Channel Model and Variable-Length Codes as Trees 
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feedback link is available, i.e., at time n the encoder knows y n . (Following common practice, random variables 
are represented by capital letters and their realizations are denoted by the corresponding lowercase letters.) 

We will assume, without loss of generality, that the channel has no "useless outputs symbols", i.e., no symbols y 
for which p(y\x) = for every x. Note that for channels for which C\ is infinite, the lower bound to the expected 
decoding time is a restatement of the fact that feedback does not increase capacity. We will therefore restrict our 
attention to channels for which C\ < oo. For such channels, p(y\x) > for every x and y; if not, there exists an x 
and y for which p{y\x) = 0. Since y is reachable from some input, there also exists an x' for which p{y\x') > 0. 
But then D(p(-\x')\\p(-\x)) = oo contradicting the finiteness of C\. The fact that both X and y are finite sets lets 
us further conclude that for the channels of interest to this paper, there is a A > for which p(y\x) > A for every 
x and y. 

A variable-length block code is defined by two maps: the encoder and the decoder. The encode^ functions 
f n (-, •) : W x y n — » X, where W = {1, ... ,M} is the set of all possible messages, determine the channel 
input X n = f n (W,Y n ~ 1 ) based on the message W and on past channel outputs Y n_1 . The decoder function 
W(-) : Z — > W, where Z is the receiver observation space until the decoding time T, i.e., Y T takes values in Z. 
The decoding time T should be a stopping time@ with respect to the receiver observation Y n otherwise the decision 
of when to decode would depend on future channel outputs and the decoder would no longer be causal. We treat 
the case when E [T] < oo, and point out that what we are setting out to prove, namely ([5]), is trivially true when 
E [T] = oo. 

The codes we consider here differ from non-block (also called sequential) codes with variable delay, such as 
those studied in [8] and [9]. In sequential coding, the message (typically as an infinite stream of bits) is introduced 
to the transmitter and decoded by the receiver in a progressive fashion. Delay is measured separately for each 
bit, and is defined as the time between the introduction and decoding of the bit. This is in contrast to the codes 
considered in this paper, where the entire message is introduced to the transmitter at the start of communication, and 
T measures the duration of the communication. Due to their different problem formulation, sequential codes with 
feedback have reliability functions that differ from those for variable-length block codes, just as fixed constraint 
length convolutional codes have reliability functions that differ from those of fixed-length block codes. 

The observation space Z is a collection of channel output sequences and for a DMC with feedback the length 
of these sequences may vary. (The length of the channel input itself may depend on the channel realization). 
Nevertheless, these sequences have the property of being prefix-free (otherwise the decision to stop would require 
knowledge of the future). Thus, Z can be represented as the leaves of a complete |^|-ary tree T (complete in the 
sense that each intermediate node has |^| descendants), and has expected depth E [T] < oo. Note that the decision 
time T is simply the first time the sequence Yi, Y2, • • • of channel outputs hits a leaf of T. Furthermore we may 
label each leaf of T with the message decoded by the receiver when that leaf is reached. This way the decoder is 
completely specified by the labeled tree T. The message statistics, the code, and the transition probabilities of the 
channel determine a probability measure on the tree T. 

III. Binary Hypothesis Testing with Feedback 

The binary case (M = 2) will play a key role in our main proof. In this section we assume that the message 
set contains only two elements. We will arbitrarily denote the two hypotheses by A and N (ACK and NACK, 
respectively). We denote by Qa and Qn the corresponding probability distributions on the leaves of T. 

The following proposition bounds the Kullback-Leibler divergence D(Qa\\Qn)- It will be used in the main 
result of this section to bound the error probability of binary hypothesis testing with feedback. The reader familiar 

3 For clarity of exposition we will only treat deterministic coding strategies here. Randomized strategies may be included without significant 
modification to the core of the proof. 

4 A discussion of stopping times can be found in [7, sect. 10.8]. 
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with Stein's Lemma will not be surprised by the fact that the Kullback-Leibler divergence D(Qa\\Qn) plays a key 
role in binary hypothesis testing with feedback. The steps here closely parallel those in [10, Sec. Ill] and [11, Sec. 
2.2]. 

Proposition 1: For any binary hypothesis testing scheme for a channel with feedback 

£>(Oi||Qjv)<Ci£[:r|A] 

where T is the decision stopping time, E [T] < oo, and E [T\ A] denotes the expectation of T conditioned on 
hypothesis A. 

Proof: In the following, we will denote probability under hypothesis A by Pa( - ) and probability under 
hypothesis iV by Pn( - )- Let 

„ . p A (ri Kj 

so that D(Qa\\Qn) = E [Vr| A], and the proposition is equivalent to the statement E [Vr — C\T\ A] < 0. Observe 

I'.AiV/, V ; 

u k wncrc u k = hi 

fe=l 



that 



Note now that 



E 



U k \A,Y K 



fc-l 



E 
E 



V n = y]Uk where U k = In 
P A {Y k \Y k ~ x 



PN(ni^ fc - 



(8) 



In 



In 



A,Y 



k-l 



Pn^I^" 1 ) 

P A (Y k \X k = f k (A,Y*-*),Y 



rk-l\ 



P N (Y k \X k = f k (N,Y k -i),Y k - 1 ) 
Pr {Y k = y\X k = f k (A, F^ 1 )} In 



rk—l\ yk— 1 



y&y 

<Ci, 



A,X k = f k (A,Y K 

Pv{Y k = y\X k = hjA^- 1 )} 
?r{Y k = y\X k = / fc (iV,y*-i)} 



(9) 



where f k (-,-) is the encoder function at time k. Consequently, {V n — nC{\ is a supermartingale under hypothesis 
A. Observe that the existence of a A > for which p(y\x) > A for all x, y implies that \U k \ < In -. We can now 
use Doob's Optional-Stopping Theorem (see e.g. [7, Sec. 10.10]) to conclude that E [V T - C\T\ A] < 0. ■ 

We can apply Proposition Q] to find a lower bound on the error probability of a binary hypothesis testing problem 
with feedback. The bound is expressed in terms of the expected decision time. 

Lemma 1: The error probability of a binary hypothesis test performed across a DMC with feedback and variable- 
length codes is lower bounded by 

mm{p A ,PN} - Cl E[T] 
4 



Pp. > 



where pa and p^ are the a priori probabilities of the hypotheses. 

Proof: Each decision rule corresponds to a tree where each leaf Y T is associated with a decoded hypothesis 
W(Y T ). Thus we can partition the leaves into two sets corresponding to the two hypotheses. 

S = {y T : W(y T ) = A] 
S = {y T : W(y T ) + A} 

where S is the decision region for hypothesis A. 



The log sum inequality [12], [13] (or data processing lemma for divergence) implies 

D (Qa\\ Qn) > Q A (S) In ^§- + Q A (S) In Q [US ) 



Qn(S) 



Qn(S)' 



(10) 
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By Proposition Q] C\E [T\ A] > D (Qa\\ Qn), thus COB can be re-arranged to give 

C X E [T\ A] > -Q A (S) In Q N (S) - h(Q A (S)), (11) 

where h( ) is the binary entropy function. Writing the overall probability of error in terms of marginal error 
probabilities yields 

P e =PNQN(S)+p A QA(S), 

which allows us to bound Qn{S) as 

Pp . P, 



Qn{S) < — < 



e 



PN mm{p A ,p N }' 

Substituting back into (fTTI ) yields a bound on the expected depth of the decision tree conditioned on A just in terms 
of Qa and the a priori message probabilities 

dE[T\A] > -Q A (S)\n—p T -h(Q A (S)). (12) 

mm{pA,PN\ 

Following identical steps with the roles of A and N swapped yields 

C X E [T\ N] > -Q N (S) In — p - h(Q N (S)). (13) 

mm{p A ,PN} 

We can now average both sides of (fl2l) and (fT3T > by weighting with the corresponding a priori probabilities. If 
we do so and use the facts that paQa(S) + PnQn(S) is the probability of making the correct decision and 
PaQa(S) + PnQn{S) is the probability of making an error together with the concavity of the binary entropy 
function, we obtain the following unconditioned bound on the depth of the decision tree 

C X E [T] > -(1 - P e ) In — -p- r - h(P e ) 

mm{pA,PN) 

> -lnP e - 2h(P e ) +1nmm{pA,PN} 

> — lnP e — 2 In 2 + In m.in{p a, Pn}- 

Solving for P e completes the proof. ■ 

It is perhaps worthwhile pointing out why the factor mm{pA,PN} arises: if one of the hypotheses has small a 
priori probability, one can achieve an equally small error probability by always deciding for the other hypothesis, 
irrespective of the channel observations. 



IV. Expected Tree Depth and Channel Capacity 

Given the channel observations y n , one can calculate the a posteriori probability Pw\Y n { w \v n ) °f anv message 
w € W. Recall that a maximum a posteriori (MAP) decoder asked to decide at time n when Y n = y n will chose 
(one of) the message(s) that has the largest a posteriori probability p max = max^ p w \ Y n(w\y n ). The probability 
of error will then be P e (y n ) = 1 — p max . Similarly, we can define the probability of error of a MAP decoder for 
each leaf of the observation tree T. Let us denote by P e (y T ) the probability of error given the observation y T . 
The unconditioned probability of error is then P e = E [P e (y T )]. 

For any fixed S > we can define a stopping time r as the first time that the error probability goes below 6, if 
this happens before T, and as T otherwise: 



r = inf {n : (P e (y n ) < 5) or (n = T)} 



(14) 
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If P e (Y T ) exceeds 5, then we are certain that r = T, and P e (Y n ) > 5 for all < n < T, so the event 
P e (Y T ) > 5 is included in the event P e (Y T ) > 5. (We have inclusion instead of equality since P e (Y T ) < 5 does 
not exclude P e (Y T ) > 5.) Thus 

Pr{P e (Y T ) >6}< Pr{P e (Y T ) > 6} < ^, (15) 
where the second inequality is an application of Markov's inequality. 

Given a particular realization y n we will denote the entropy of the a posteriori distribution Pw\Y™{'\y n ) as 
H(W\y n ). Then H{W\Y n ) is a random variable! and E [H(W\Y n )\ = H(W\Y n ). If P e {y T ) < d < ±, then from 
Fano's inequality it follows that 

H{W\y T ) < h(<5) + 5 In M. (16) 

The expected value of TL{W\Y T ) can be bounded by conditioning on the event P e (Y T ) < 5 and its complement 
then applying ( fT6l ) and then (fT5T ) as follows 

£[H(VF|y r )] = J B[H(VF|y r )[p e (y T ) <5]Pr{p e (y T ) < 5} + J e[w(PF|y r )|p e (r T ) >^]Pr{p e (y T ) >5} 

< (h(5) +51nM)Pr{P e (y T ) < 5} + (InM) Pr {P e (y r ) > 5)} 
<h(5) + + lnM - 

This upper bound on the expected posterior entropy at time r can be turned into a lower bound on the expected 
value of r by using the channel capacity as an upper bound to the expected change of entropy. This notion is made 
precise by the following lemma, 



Lemma 2: For any < S < \ 



6 J C C 



Proof: Observe that {H(W\Y n ) + nC} is a submartingale (an observation already made in [3, Lemma 2]). 
To see this, 

E \H{W\Y n ) - H{W\Y n+1 )\ Y n = y n ] = I(W;Y n+1 \Y n = y n ) 

< I(X n+1 ;Y n+1 \Y n = y n ) 

< C 

where (a) follows from the data processing inequality and the fact that W — X n+ \ — Y n+ \ forms a Markov chain 
given Y n = y n . Hence {H{W\Y n ) + nC} is indeed a submartingale. Since H(W\y n ) is bounded between and 
InM for all n, and the expected stopping time E [r] < E [T] < oo, Doob's Optional-Stopping Theorem allows 
us to conclude that at time r the expected value of the submartingale must be greater than or equal to the initial 
value, InM. Hence 

In M = H(W \Y°) < E [H{W \Y T ) + rC] 

= E[H{W\Y t )] + E[t]C 

< h(5) + (S+ ^\ InM + E [t] C. 

Solving for E[r\ yields 



8 C C 



5 Notice that TL{W\y n ) is commonly written as H(W\Y n = y n ). We cannot use the standard notation since it becomes problematic when 
we substitute V" for y n as we just did. 
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V. Burnashev's Lower Bound 



In this section we will combine the two bounds we have established in the preceding sections to obtain a bound 
on the overall expected decoding time. Lemma [2] provides a lower bound on E [r] as a function of M, 5 and P e . 
We will show that a properly constructed binary hypothesis testing problem allows us to use Lemma [T] to lower 
bound the probability of error in terms of E [T — r| Y T ]. This in turn will lead us to the final bound on E [T]. 

The next proposition states that a new channel output symbol can not change the a posteriori probability of any 
particular message by more than some constant factor when C\ is finite. 



Proposition 2: C\ < oo implies 



where < A = mxn x>y p(y\x) < \. 



Xp{w\y n - 1 ) <p{w\y n ) < g_H^!_!) 



Proof: Using B ayes' rule, the posterior may be written recursively as 

( | n \ ( I n-l\ PjVnW = fn{w,y n ~ 1 )) 

p{w\y)=p(w\y ) — —— . 

p{yn\y n ~ L ) 

The quotient may be upper and lower bounded using 1 > p (y n \x n = f n (w,y n )) > A and 1 > p{y n \y n ~ l ) > A, 
which yields the statement of the proposition. ■ 

Our objective is to lower bound the probability of error of a decoder that decides at time T. The key idea is 
that a binary hypothesis decision such as deciding whether or not W lies in some set Q can be made at least as 
reliably as a decision on the value of W itself. 

Given a set Q of messages, consider deciding between W G Q and W g" Q in the following way: given access 
to the original decoder's estimate W, declare that W G Q if W G Q, and declare W G" Q otherwise. This binary 
decision is always correct when the original decoder's estimate W is correct. Hence the probability of error of this 
(not necessarily optimal) binary decision rule cannot exceed the probability of error of the original decoder, for 
any set Q. Thus the error probability of the optimal decoder deciding at time T whether or not W G Q is a lower 
bound to the error probability of any decoder that decodes W itself at time T. This fact is true even if the set 
Q is chosen at a particular stopping time r and the error probabilities we are calculating are conditioned on the 
observation Y T . 

For every realization of Y T , the message set can be divided into two parts, Q(Y T ) and its complement W\G{Y T ), 
in such a way that both parts have an a posteriori probability greater than \5. The rest of this paragraph describes 
how this is possible. From the definition of t, at time r — 1 the a posteriori probability of every message is smaller 
than 1 — 5. This implies that the sum of the a posteriori probabilities of any set of M — 1 messages is greater than 
5 at time r — 1, and by Proposition |2j greater than X5 at time r. In particular, P e {y T ) > X5. We separately consider 
the cases P e {y T ) < 5 and P e {y T ) > 5. In the first case, P e {y T ) < 5, let G(Y T ) be the set consisting of only the 
message with the highest a posteriori probability at time r. The a posteriori probability of Q(Y T ) then satisfies 
Pr {Q(Y T )} > 1 — 5 > 1/2 > X5. As argued above, its complement (the remaining M — 1 messages) also has 
a posteriori probability greater than X5, thus for this G(Y T ), Pr {^(y T )| Y T } G [A<5, 1 — X6]. In the second case, 
namely when P e {y T ) > 5, the a posteriori probability of each message is smaller than 1 — 5. In this case the set 
QiY T ) may be formed by starting with the empty set and adding messages in arbitrary order until the threshold 
5/2 is exceeded. This ensures that the a priori probability of QiY T ) is greater than X5. Notice that the threshold 
will be exceeded by at most 1 — 5, thus the complement set has an a posteriori probability of at least 5/2 > A 5. 
Thus Pr {G(Y T )\ Y T } G [X5, 1 - X5}. 



For any realization of Y T we have the binary hypothesis testing problem, running from r until T, deciding 
whether or not W G G(Y T ). Notice that the a priori probabilities of the two hypotheses of this binary hypothesis 
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testing problem are the a posteriori probabilities of Q{Y T ) and W \ G(Y T ) at time r each of which is shown to 
be greater than A<5 in the paragraph above . We apply Lemma Q] with A = Q(Y T ) and N = W\ QiY T ) to lower 
bound the probability of error of the binary decision made at time T and, as argued above, we use the result to 
lower bound the probability that W ^ W . Initially everything is conditioned on the channel output up to time r, 
thus 



Pr \ W (Y T ) / W 



Y 



'} 



X8 



-C x E{T-t\Y t ] 



Taking the expectation of the above expression over all realizations of Y T yields the unconditional probability of 
error 



P P =E 



Vi{W(Y T ) + W 



yr j 


> E 







^ c -dE\T-T\Y T 



Using the convexity of e 1 and Jensen's inequality, we obtain 



P > ^e-dElT-r] 
re ~ 4 e 



Solving for E[T-t] yields 



Combing Lemma [2] and (fTTT ) yields: 



E[T — t] > 



lnP e - ln4 + ln(A<5) 



(17) 



Theorem 1: The expected decoding time T of any variable-length block code for a DMC used with feedback 
is lower bounded by 

Pe 
8 



E[T}> [1-8 



In M - In P e h(5) + ln(XS) - In 4 



C 



c 



Ci 



(18) 



where M is the cardinality of the message set, P e the error probability, A = mm x< = x ,y<=y p(y\x), and 5 is any 
number satisfying < 8 < |. ■ 

Choosing the parameter 8 as 5 = — j^p- achieves the required scaling for ©. 



VI. Summary 



We have presented a new derivation of Burnashev's asymptotically tight lower bound to the average delay needed 
for a target error probability when a message is communicated across a DMC used with (channel output) feedback. 
Our proof is simpler than the original, yet provides insight by clarifying the role played by the quantities that 
appear in the bound. Specifically, from the channel coding theorem we expect it to take roughly -^4^ channel uses 
to reduce the probability of error of a MAP decision to some small (but not too small) value. At this point we can 
partition the message set in two subsets, such that neither subset has too small an a posteriori probability. From 
now on it takes (asymptotically) — channel uses to decide with probability of error P e which of the two sets 
contains the true message. It takes at least as many channel uses to decide which message was selected and incur 
the same error probability. 

For obvious reasons we may call the two phases the communication and the binary hypothesis testing phase, 
respectively. These two phases exhibit a pleasing similarity to the communication and confirmation phase of the 
optimal scheme proposed and analyzed by Yamamoto and Itoh in [4]. The fact that these two phases play a key 
role in proving achievability as well as in proving that one cannot do better suggests that they are an intrinsic 
component of an optimal communication scheme using variable-length block codes over DMCs with feedback. 
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